HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms

نویسندگان

  • Joo Hwan Lee
  • Nimit Nigania
  • Hyesoon Kim
  • Bevin Brett
چکیده

Heterogeneous computing has emerged as one of the major computing platforms in many domains. Although there have been several proposals to aid programming for heterogeneous computing platforms, optimizing applications on heterogeneous computing platforms is not an easy task. Identifying which parallel regions (or tasks) should run on GPUs or CPUs is one of the critical decisions to improve performance. In this paper, we propose a profiler, HPerf, to identify an efficient task distribution on CPUs+GPUs system with low profiling overhead. HPerf is a hierarchical profiler. First it performs lightweight profiling and then if necessary, it performs detailed profiling to measure caching and data transfer cost. Compared to a brute-force approach, HPerf reduces the profiling overhead significantly and compared to a naive decision, HPerf improves the performance of OpenCL applications up to 25%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization

—We have developed a task-parallel runtime system, called TREES, that is designed for high performance on CPU/GPU platforms. On platforms with multiple CPUs, Cilk's " work-first " principle underlies how task-parallel applications can achieve performance, but work-first is a poor fit for GPUs. We build upon work-first to create the " work-together " principle that addresses the specific strengt...

متن کامل

Performance and Energy Aware Workload Partitioning on Heterogeneous Platforms

Heterogeneous platforms which employ a mix of CPUs and accelerators such as GPUs have been widely used in the high-performance computing area [1]. Such heterogeneous platforms have the potential to offer higher performance at lower energy cost than homogeneous platforms. However, it is rather challenging to actually achieve the high performance and energy efficiency promised by heterogeneous pl...

متن کامل

A Parallel Twig Join Algorithm for XML Processing using a GPGPU

With an increasing amount of data and demand for fast query processing, the efficiency of database operations continues to be a challenging task. A common approach is to leverage parallel hardware platforms. With the introduction of general-purpose GPU (Graphics Processing Unit) computing, massively parallel hardware has become available within commodity hardware. XML is based on a tree-structu...

متن کامل

Efficient CPU-GPU cooperative computing for solving the subset-sum problem

Heterogeneous CPU-GPU system is a powerful way to accelerate compute-intensive applications, such as the subset-sum problem. Many parallel algorithms for solving the problem have been implemented on graphics processing units (GPUs). However, these GPU implementations may fail to fully utilize all the CPU cores and the GPU resources. When the GPU performs computational task, only one CPU core is...

متن کامل

Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU

Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015